Improving Parallel System Performance with a NUMA-aware Load Balancer

نویسندگان

Laércio L. Pilla

Christiane Pousa Ribeiro

Daniel Cordeiro

Abhinav Bhatele

Philippe O. A. Navaux

Jean-François Méhaut

Laxmikant V. Kale

چکیده

Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high performance computing. On such NUMA nodes, the shared memory is physically distributed into memory banks connected by a network. Owing to this, memory access costs may vary depending on the distance between the processing unit and the memory bank. Therefore, a key element in improving the performance on these machines is dealing with memory affinity. We propose a NUMA-aware load balancer that combines the information about the NUMA topology with the statistics captured by the Charm++ runtime system. We present speedups of up to 1.8 for synthetic benchmarks running on different NUMA platforms. We also show improvements over existing load balancing strategies both in benchmark performance and in the time for load balancing. In addition, by avoiding unnecessary migrations, our algorithm incurs up to seven times smaller overheads in migration, than the other strategies. Keywords-load balancing, non-uniform memory access, memory contention, performance, object migration

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topology-Aware Parallelism for NUMA Copying Collectors

NUMA-aware parallel algorithms in runtime systems attempt to improve locality by allocating memory from local NUMA nodes. Researchers have suggested that the garbage collector should profile memory access patterns or use object locality heuristics to determine the target NUMA node before moving an object. However, these solutions are costly when applied to every live object in the reference gra...

متن کامل

Task Parallel Models Based on Dynamic Data Placement to Reduce NUMA Effects

NUMA (Non-Uniform Memory Access) multicore computers become popular in scientific and industrial fields due to its scalable memory performance. However, large-scale intensive data computing on NUMA architecture are facing up to the challenges in data locality problems called NUMA effects that are caused by the overhead accesses of cross-node data. Our task parallel model bases on the strategy o...

متن کامل

Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring Architectures

Parallel computing performance on scalable shared-memory architectures is aaected by the structure of the interconnection networks linking processors to memory modules and on the eeciency of the memory/cache management systems. Cache Coherence Non-Uniform Memory Access (CC-NUMA) and Cache Only Memory Access (COMA) are two eeective memory systems, and the hierarchical ring structure is an eecien...

متن کامل

NBA (Network Balancing Act)

We present the NBA framework, which extends the architecture of the Click modular router to exploit modern hardware, adapts to different hardware configurations, and reaches close to their maximum performance without manual optimization. NBA takes advantages of existing performance-excavating solutions such as batch processing, NUMA-aware memory management, and receiveside scaling with multi-qu...

متن کامل

Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring rchitectures

Parallel computing performance on scalable share& memory architectures is affected by the structure of the interconnection networks linking processors to memory modules and on the efficiency of the memory/cache management systems. Cache Coherence Nonuniform Memory Access (CC-NUMA) and Cache Only Memory Access (COMA) are two effective memory systems, and the hierarchical ring structure is an eff...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Improving Parallel System Performance with a NUMA-aware Load Balancer

نویسندگان

چکیده

منابع مشابه

Topology-Aware Parallelism for NUMA Copying Collectors

Task Parallel Models Based on Dynamic Data Placement to Reduce NUMA Effects

Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring Architectures

NBA (Network Balancing Act)

Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring rchitectures

عنوان ژورنال:

اشتراک گذاری